KD-Tree

What is a KD-Tree?

A KD-Tree (short for k-dimensional tree) is a smart way to organize points in 2D, 3D, or higher dimensions so we can quickly find nearby points. Think of it like a binary search tree—but instead of sorting just numbers, it sorts coordinates like (x, y) or (x, y, z). Each level of the tree decides whether to split based on x, y, or z, cycling through these dimensions as it goes deeper. -Example: 1.At level 0, it compares x values to split left and right. 2.At level 1, it compares y values 3.At level 2 (if in 3D), it compares z, and so on. 4.This way, the KD-Tree divides space into rectangles (or boxes), helping us narrow down which area to search for nearby points. -Where It’s Used 1.Finding closest neighbors in maps or games (e.g., "what's the nearest enemy?") 2.Robotics (e.g., nearest obstacle) 3.Machine learning (e.g., KNN classifiers) 4.3D simulations and computer graphics -Strengths 1.Very fast for nearest neighbor and range queries 2.Great for organizing spatial data in 2D/3D 3.Easy to visualize and understand -Weaknesses 1.Slows down with very high dimensions (like 10D+) 2.Needs rebalancing after many insertions/deletions 3.Performance drops with uneven or clustered data

KD-Tree Operations

Construction:

Initialize: Start with all points in root node
Recursive Split:
- Even depths: Split space vertically (x-coordinate)
- Odd depths: Split horizontally (y-coordinate)
- Choose median point for balanced splits
Terminate: When nodes contain ≤ threshold points

Insertion:

Traverse tree comparing point at each level’s split axis
Recurse left/right depending on coordinate
Insert at empty leaf with updated bounding region

Deletion:

Find node matching the point and its split axis
If leaf, remove directly; else find replacement (e.g. min node in subtree)
Rebuild subtree rooted at deleted node

Nearest Neighbor Search:

Traverse tree comparing target point with splitting planes
Maintain priority queue of candidates
Prune subtrees when bounding box is too far
Backtrack to verify no closer points exist

Range Queries:

Check if current node's region intersects query area
Recurse into both subtrees if needed
Collect points within range at leaf nodes

Complexities

Operation	Average Case	Worst Case
Bulk Loading	O(n log n)	O(n log n)
Insertion	O(log n)	O(n)
Deletion	O(log n)	O(n)
Range Query	O(n^1−1/k)	O(n)
k-NN Query	O(k log n)	O(kn)

Bulk Loading :

Building a KD-Tree involves recursively partitioning the space by alternating axes at each level. At each recursive step, we choose the median point along the current axis to ensure a balanced tree. Finding the median takes O(n) and we perform this for log n levels (since the tree height is log n in a balanced case), giving a total complexity of O(n log n). This balance helps ensure efficient querying later on.

Insertion:

To insert a new point, we start at the root and recursively compare the point’s coordinate to the current node’s coordinate on the corresponding axis. We follow a single path down the tree, choosing left or right based on the comparison. In a well-balanced tree, this path has a depth of log n, resulting in O(log n) time. However, if the tree becomes unbalanced (e.g., due to inserting sorted or clustered data), the depth can grow to n, leading to a worst-case complexity of O(n).

Deletion :

Deletion is more complex than insertion because we may need to restructure part of the tree. After finding the node to delete (which takes O(log n) on average), we have to replace it with a suitable candidate (e.g., the minimum node in the subtree along the current axis). This may involve scanning a subtree and rebalancing parts of the tree. So while average performance remains O(log n), worst-case deletion can degenerate to O(n) if the tree is skewed or contains degenerate structures.

Range Query :

A range query checks which points fall inside a given region (e.g., rectangle in 2D). Instead of scanning all points, we prune entire subtrees if their bounding regions lie outside the query. On average, this leads to visiting about O(n^1−1/k) nodes, where k is the number of dimensions. We also have to return m points that actually lie inside the query, so total complexity becomes O(n^1−1/k + m). In the worst case—such as when the query covers the whole space—we may visit all nodes, giving O(n).

k-NN Query :

The k-nearest neighbors query searches for the closest k points to a given location. KD-Trees use a backtracking search with a priority queue to explore only the most promising branches first, pruning away regions that cannot contain closer points. On average, this gives O(k log n) time since we often don’t need to examine the entire tree. However, in the worst case (such as high dimensions or poor balance), the algorithm may need to explore nearly all nodes, leading to O(kn) time.